Distant Supervised Relation Extraction with Wikipedia and Freebase
نویسندگان
چکیده
In this paper we discuss a new approach to extract relational data from unstructured text without the need of hand labeled data. Socalled distant supervision has the advantage that it scales large amounts of web data and therefore fulfills the requirement of current information extraction tasks. As opposed to supervised machine learning we train generic, relationand domain-independent extractors on the basis of data base entries. We use Freebase as a source of relational data and a Wikipedia corpus tagged with unsupervised word classes. In contrast to previous work in the field of distant supervision, we do not rely on preprocessing steps that involve supervised learning. This work consists of three parts, a distant supervised Named Entity Recognizer (NER), a distant supervised classifier to recognize sentences in which a certain relation between two objects is described and the combination of both, allowing us for example to contribute new instances to Freebase. The performance of the NER is too low, that the combined method produces usable results. Still the subcomponents can be used independently.
منابع مشابه
Passage Retrieval for Information Extraction using Distant Supervision
In this paper, we propose a keyword-based passage retrieval algorithm for information extraction, trained by distant supervision. Our goal is to be able to extract attributes of people and organizations more quickly and accurately by first ranking all the potentially relevant passages according to their likelihood of containing the answer and then performing a traditional deeper, slower analysi...
متن کاملDistant supervision for relation extraction without labeled data
Modern models of relation extraction for tasks like ACE are based on supervised learning of relations from small hand-labeled corpora. We investigate an alternative paradigm that does not require labeled corpora, avoiding the domain dependence of ACEstyle algorithms, and allowing the use of corpora of any size. Our experiments use Freebase, a large semantic database of several thousand relation...
متن کاملDistant Supervision for Entity Linking
Entity linking is an indispensable operation of populating knowledge repositories for information extraction. It studies on aligning a textual entity mention to its corresponding disambiguated entry in a knowledge repository. In this paper, we propose a new paradigm named distantly supervised entity linking (DSEL), in the sense that the disambiguated entities that belong to a huge knowledge rep...
متن کاملMSIIPL THU’s Slot-Filling Method for TAC-KBP 2015
This paper presents the design and implementation of our first English slot filling system. The slot filling task aims at extracting attribute values of the given entities. The core of the system is a set of supervised per-relation classifiers, trained by a scheme known as distant supervision. We use Freebase and Wikipedia to generate our training query-filler pairs. Annoted Gigaword received f...
متن کاملCollective Cross-Document Relation Extraction Without Labelled Data
We present a novel approach to relation extraction that integrates information across documents, performs global inference and requires no labelled text. In particular, we tackle relation extraction and entity identification jointly. We use distant supervision to train a factor graph model for relation extraction based on an existing knowledge base (Freebase, derived in parts from Wikipedia). F...
متن کامل